Car Analysis Project

Author

Ikenna Atum

Published

May 12, 2025

An Analysis of What Influences Car Price

Introduction

The purpose of this project is to analyze the determinants of car price. During the pandemic the used car market was sent surging due to supply shortages. This article has some good information on this (https://spectrumlocalnews.com/nys/central-ny/news/2024/05/02/used-vehicles-during-pandemic) This lead me to think of one question, “what are the main determinants for the price of a car?” Through this project the question will be answered with not only machine learning models but with compelling data visualizations to illustrate the point. There will also be a section at the end to of this analysis to predict car prices using the model developed in this project.

Data Description

The data comes from Kaggle. This dataset encompasses details such as the year, make, model, trim, body type, transmission type, VIN (Vehicle Identification Number), state of registration, condition rating, odometer reading, exterior and interior colors, seller information, Manheim Market Report (MMR) values, selling prices, and sale dates.

Results

There are many important variables that influence car price. This ranges from the odometer and the body of the car all the way to the trim level. Market conditions also play a huge roll as we will see below. A change in the Manheim Market Report value can cause a huge change in the price of the vehicle since. So supply side shortages or a widely successful campaign could influence car price. For more research down the line, larger data sets and more variables will be required.

Further Analysis

To build on this analysis we can look and see how time has an effect on consumer car demand. If cars have higher selling prices at different points of the year, buyers can use this to there advantage and get a better deal for said vehicle in the market.

Predicting the Prices for Cars

Load the data

Code
library(readr)
cars <- read.csv("C:/Users/Ikeat/Desktop/car_prices.csv", stringsAsFactors = TRUE)
str(cars)
'data.frame':   558837 obs. of  16 variables:
 $ year        : int  2015 2015 2014 2015 2014 2015 2014 2014 2014 2014 ...
 $ make        : Factor w/ 97 levels "","acura","Acura",..: 48 48 10 96 10 72 10 17 7 17 ...
 $ model       : Factor w/ 974 levels "","1","1 Series",..: 814 814 11 749 44 85 534 239 62 157 ...
 $ trim        : Factor w/ 1964 levels "","!","& coun fwd",..: 1230 1230 285 1677 500 122 601 64 89 1199 ...
 $ body        : Factor w/ 88 levels "","access cab",..: 78 78 72 72 72 72 72 72 72 12 ...
 $ transmission: Factor w/ 5 levels "","automatic",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ vin         : Factor w/ 550298 levels "","137fa90362e197965",..: 403802 403794 507010 546653 508043 193119 522013 106703 503411 250221 ...
 $ state       : Factor w/ 64 levels "3vwd17aj0fm227318",..: 30 30 30 30 30 30 30 30 30 30 ...
 $ condition   : int  5 5 45 41 43 1 34 2 42 3 ...
 $ odometer    : int  16639 9393 1331 14282 2641 5554 14943 28617 9557 4809 ...
 $ color       : Factor w/ 47 levels "","—","11034",..: 46 46 36 46 36 36 30 30 46 43 ...
 $ interior    : Factor w/ 18 levels "","—","beige",..: 4 3 4 4 4 4 4 4 4 4 ...
 $ seller      : Factor w/ 14263 levels "1 cochran of monroeville",..: 7203 7203 4982 13796 4982 4609 12748 4609 967 3775 ...
 $ mmr         : int  20500 20800 31900 27500 66000 15350 69000 11900 32100 26300 ...
 $ sellingprice: int  21500 21500 30000 27750 67000 10900 65000 9800 32250 17500 ...
 $ saledate    : Factor w/ 3767 levels "","10500","10700",..: 1777 1777 1165 1260 819 1843 2656 1778 817 2111 ...
Code
library(pacman)
p_load(tidyverse, mdsr, sf)
Code
p_load(DT, gt, naniar, ggConvexHull, tidymodels, yardstick, plotly)

Exploring and Preparing the data and cleaning the data

Summarize the selling price variable

Code
head(cars) %>% gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate
2015 Kia Sorento LX SUV automatic 5xyktca69fg566472 ca 5 16639 white black kia motors america inc 20500 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
2015 Kia Sorento LX SUV automatic 5xyktca69fg561319 ca 5 9393 white beige kia motors america inc 20800 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
2014 BMW 3 Series 328i SULEV Sedan automatic wba3c1c51ek116351 ca 45 1331 gray black financial services remarketing (lease) 31900 30000 Thu Jan 15 2015 04:30:00 GMT-0800 (PST)
2015 Volvo S60 T5 Sedan automatic yv1612tb4f1310987 ca 41 14282 white black volvo na rep/world omni 27500 27750 Thu Jan 29 2015 04:30:00 GMT-0800 (PST)
2014 BMW 6 Series Gran Coupe 650i Sedan automatic wba6b2c57ed129731 ca 43 2641 gray black financial services remarketing (lease) 66000 67000 Thu Dec 18 2014 12:30:00 GMT-0800 (PST)
2015 Nissan Altima 2.5 S Sedan automatic 1n4al3ap1fn326013 ca 1 5554 gray black enterprise vehicle exchange / tra / rental / tulsa 15350 10900 Tue Dec 30 2014 12:00:00 GMT-0800 (PST)
Code
summary(cars$sellingprice)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      1    6900   12100   13611   18200  230000      12 

Histogram of selling price

Code
hist(cars$sellingprice)

Distribution of the selling price of cars in the dataset

tables of a few of the categorical variables

Code
table(cars$make)

                      acura         Acura     airstream  Aston Martin 
        10301            25          5901             1            25 
         audi          Audi       Bentley           bmw           BMW 
            8          5869           116            74         20719 
        buick         Buick      cadillac      Cadillac    chev truck 
           14          5107           110          7519             1 
    chevrolet     Chevrolet      chrysler      Chrysler        Daewoo 
          390         60197           209         17276             3 
        dodge         Dodge      dodge tk           dot       Ferrari 
          245         30710             1             1            19 
         FIAT        Fisker          ford          Ford       ford tk 
          865             9           443         93554             1 
   ford truck           Geo           gmc           GMC     gmc truck 
            3            19            25         10613            11 
        honda         Honda        HUMMER       hyundai       Hyundai 
          145         27206           805            20         21816 
   hyundai tk      Infiniti         Isuzu        Jaguar          jeep 
            1         15305           204          1420           111 
         Jeep           kia           Kia   Lamborghini    land rover 
        15372             7         18077             4           129 
   Land Rover     landrover         lexus         Lexus       lincoln 
         1735            27           119         11861            29 
      Lincoln         Lotus      maserati      Maserati         mazda 
         5757             1             3           133           146 
        Mazda      mazda tk      mercedes    mercedes-b Mercedes-Benz 
         8362             1            70             2         17141 
      mercury       Mercury          MINI    mitsubishi    Mitsubishi 
           31          1992          3224           117          4140 
       nissan        Nissan    oldsmobile    Oldsmobile      plymouth 
           71         53946            20           364             7 
     Plymouth       pontiac       Pontiac       porsche       Porsche 
           20            27          4497            19          1383 
          Ram   Rolls-Royce          Saab        Saturn         Scion 
         4574            17           484          2841          1687 
        smart        subaru        Subaru        suzuki        Suzuki 
          396            60          5043             5          1073 
        Tesla        toyota        Toyota    volkswagen    Volkswagen 
           23            95         39871            24         12581 
        Volvo            vw 
         3788            24 
Code
table(cars$body) 

                                     access cab              Access Cab 
                  13195                      62                     232 
     beetle convertible      Beetle Convertible                Cab Plus 
                      7                      52                       4 
             cab plus 4              Cab Plus 4                club cab 
                      1                       5                      22 
               Club Cab             convertible             Convertible 
                    156                    1824                    8652 
                  coupe                   Coupe                crew cab 
                   3150                   14602                    3114 
               Crew Cab             crewmax cab             CrewMax Cab 
                  13280                     120                     445 
            cts-v coupe             CTS-V Coupe             CTS-V Wagon 
                      7                      28                       1 
              cts coupe               CTS Coupe               cts wagon 
                     29                     129                       1 
              CTS Wagon              double cab              Double Cab 
                     13                     350                    1251 
           e-series van            E-Series Van           elantra coupe 
                    368                    1455                      23 
          Elantra Coupe            extended cab            Extended Cab 
                     80                     683                    3824 
          g convertible           G Convertible                 g coupe 
                     74                     249                     330 
                G Coupe                 g sedan                 G Sedan 
                   1263                    1418                    5999 
        g37 convertible         G37 Convertible               g37 coupe 
                      4                      16                       1 
              G37 Coupe           genesis coupe           Genesis Coupe 
                     11                      62                     232 
granturismo convertible GranTurismo Convertible               hatchback 
                      6                       7                    4857 
              Hatchback                king cab                King Cab 
                  21380                      96                     436 
                   koup                    Koup                mega cab 
                     33                     147                      52 
               Mega Cab                 minivan                 Minivan 
                     59                    4166                   21363 
            Navitgation     promaster cargo van     Promaster Cargo Van 
                     26                      10                      49 
        q60 convertible         Q60 Convertible               q60 coupe 
                      4                      38                       4 
              Q60 Coupe                quad cab                Quad Cab 
                     32                     659                    3436 
                Ram Van             regular-cab             regular cab 
                      1                      15                     783 
            Regular Cab                   sedan                   Sedan 
                   4067                   41906                  199437 
               supercab                SuperCab               supercrew 
                    862                    4449                    1610 
              SuperCrew                     suv                     SUV 
                   7423                   24552                  119292 
            transit van             Transit Van         tsx sport wagon 
                      7                      12                       8 
        TSX Sport Wagon                     van                     Van 
                     28                     570                    3958 
                  wagon                   Wagon                 xtracab 
                   2499                   13630                       4 
                Xtracab 
                     40 
Code
table(cars$transmission)

          automatic    manual     sedan     Sedan 
    65352    475915     17544        15        11 
Code
table(cars$state)

3vwd17aj0fm227318 3vwd17aj2fm258506 3vwd17aj2fm261566 3vwd17aj2fm285365 
                1                 1                 1                 1 
3vwd17aj3fm259017 3vwd17aj3fm276741 3vwd17aj4fm201708 3vwd17aj4fm236636 
                1                 1                 1                 1 
3vwd17aj5fm206111 3vwd17aj5fm219943 3vwd17aj5fm221322 3vwd17aj5fm225953 
                1                 1                 1                 1 
3vwd17aj5fm268964 3vwd17aj5fm273601 3vwd17aj5fm297123 3vwd17aj6fm218641 
                1                 1                 1                 1 
3vwd17aj6fm231972 3vwd17aj7fm218440 3vwd17aj7fm222388 3vwd17aj7fm223475 
                1                 1                 1                 1 
3vwd17aj7fm229552 3vwd17aj7fm326640 3vwd17aj8fm239622 3vwd17aj8fm298895 
                1                 1                 1                 1 
3vwd17aj9fm219766 3vwd17ajxfm315938                ab                al 
                1                 1               928                26 
               az                ca                co                fl 
             8741             73148              7775             82945 
               ga                hi                il                in 
            34750              1237             23486              4325 
               la                ma                md                mi 
             2191              6729             11158             15511 
               mn                mo                ms                nc 
             9429             16013              1851             21845 
               ne                nj                nm                ns 
             4013             27784               171                61 
               nv                ny                oh                ok 
            12685              5699             21575                72 
               on                or                pa                pr 
             3442              1155             53907              2725 
               qc                sc                tn                tx 
             1245              4251             20895             45913 
               ut                va                wa                wi 
             1836             12027              7416              9851 
Code
table(cars$color)

                  —     11034      1167     12655     14872     15719     16633 
      749     24685         1         1         1         1         1         1 
    18384     18561     20379     20627      2172      2711      2817      2846 
        1         1         1         1         1         1         1         1 
      339      4802      5001      5705      6158      6388      6864       721 
        1         1         1         1         1         1         1         1 
     9410      9562      9837      9887     beige     black      blue     brown 
        1         1         1         1      9222    110970     51139      6717 
 burgundy  charcoal      gold      gray     green      lime off-white    orange 
     8972       479     11342     82857     11382        15      1449      2078 
     pink    purple       red    silver turquoise     white    yellow 
       42      1561     43569     83389       236    106673      1285 
Code
table(cars$interior)

                  —     beige     black      blue     brown  burgundy      gold 
      749     17077     59758    244329      1143      8640       191       324 
     gray     green off-white    orange    purple       red    silver       tan 
   178581       245       480       145       339      1363      1104     44093 
    white    yellow 
      256        20 

Checking on the missing variables

Code
gg_miss_var(cars)

Code
library(skimr)
Warning: package 'skimr' was built under R version 4.4.3

Attaching package: 'skimr'
The following object is masked from 'package:naniar':

    n_complete
The following object is masked from 'package:mdsr':

    skim
Code
skim(cars)
Warning: There was 1 warning in `dplyr::summarize()`.
ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
  mangled_skimmers$funs)`.
ℹ In group 0: .
Caused by warning:
! There were 9 warnings in `dplyr::summarize()`.
The first warning was:
ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
  mangled_skimmers$funs)`.
Caused by warning in `sorted_count()`:
! Variable contains value(s) of "" that have been converted to "empty".
ℹ Run `dplyr::last_dplyr_warnings()` to see the 8 remaining warnings.
Data summary
Name cars
Number of rows 558837
Number of columns 16
_______________________
Column type frequency:
factor 11
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
make 0 1 FALSE 97 For: 93554, Che: 60197, Nis: 53946, Toy: 39871
model 0 1 FALSE 974 Alt: 19349, F-1: 14479, Fus: 12946, Cam: 12545
trim 0 1 FALSE 1964 Bas: 55817, SE: 43648, LX: 20757, Lim: 18367
body 0 1 FALSE 88 Sed: 199437, SUV: 119292, sed: 41906, suv: 24552
transmission 0 1 FALSE 5 aut: 475915, emp: 65352, man: 17544, sed: 15
vin 0 1 FALSE 550298 aut: 22, wba: 5, emp: 4, 1ft: 4
state 0 1 FALSE 64 fl: 82945, ca: 73148, pa: 53907, tx: 45913
color 0 1 FALSE 47 bla: 110970, whi: 106673, sil: 83389, gra: 82857
interior 0 1 FALSE 18 bla: 244329, gra: 178581, bei: 59758, tan: 44093
seller 0 1 FALSE 14263 nis: 19693, for: 19162, the: 18299, san: 15285
saledate 0 1 FALSE 3767 Tue: 5334, Tue: 5016, Tue: 4902, Tue: 4731

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2010.04 3.97 1982 2007 2012 2013 2015 ▁▁▁▃▇
condition 11820 0.98 30.67 13.40 1 23 35 42 49 ▃▂▆▇▇
odometer 94 1.00 68320.02 53398.54 1 28371 52254 99109 999999 ▇▁▁▁▁
mmr 38 1.00 13769.38 9679.97 25 7100 12250 18300 182000 ▇▁▁▁▁
sellingprice 12 1.00 13611.36 9749.50 1 6900 12100 18200 230000 ▇▁▁▁▁

Dropping all the missing values

Code
cars <- cars |> drop_na()

Exploring relationships among features: correaltion matrix

Code
cor(cars[c("year", "condition", "odometer", "mmr", "sellingprice")])
                   year  condition   odometer        mmr sellingprice
year          1.0000000  0.3402974 -0.7722589  0.5897473    0.5798719
condition     0.3402974  1.0000000 -0.3181999  0.2813847    0.3219118
odometer     -0.7722589 -0.3181999  1.0000000 -0.5833888   -0.5781809
mmr           0.5897473  0.2813847 -0.5833888  1.0000000    0.9838255
sellingprice  0.5798719  0.3219118 -0.5781809  0.9838255    1.0000000

Visualizing the relationships among features: scatterplot matrix

Code
pairs(cars[c("year", "condition", "odometer", "mmr", "sellingprice")], pch = ".")

More informative scatterplot martix

Code
library(psych)
Warning: package 'psych' was built under R version 4.4.3

Attaching package: 'psych'
The following objects are masked from 'package:scales':

    alpha, rescale
The following objects are masked from 'package:ggplot2':

    %+%, alpha
Code
pairs.panels(cars[c("year", "condition", "odometer", "mmr", "sellingprice")], pch = ".")

This scatterplot matrix shows the relationship between the two variables. The histogram shows the distribution of each variable.

Training the model on the data

Code
car_model <- lm(sellingprice ~ year + condition + odometer + mmr , data = cars)

See the estimated beta coefficients

Code
options(scipen = 999) # turn off scientific notation
car_model

Call:
lm(formula = sellingprice ~ year + condition + odometer + mmr, 
    data = cars)

Coefficients:
 (Intercept)          year     condition      odometer           mmr  
91508.718281    -46.018458     37.367641     -0.001178      0.983930  

Evaluating model performance

See more detail about the estimated beta coefficients

Code
summary(car_model)

Call:
lm(formula = sellingprice ~ year + condition + odometer + mmr, 
    data = cars)

Residuals:
   Min     1Q Median     3Q    Max 
-86621   -671     22    760 207165 

Coefficients:
                  Estimate     Std. Error t value            Pr(>|t|)    
(Intercept) 91508.71828065  1929.28797966   47.43 <0.0000000000000002 ***
year          -46.01845796     0.95916648  -47.98 <0.0000000000000002 ***
condition      37.36764135     0.18180663  205.53 <0.0000000000000002 ***
odometer       -0.00117848     0.00007014  -16.80 <0.0000000000000002 ***
mmr             0.98392988     0.00030153 3263.12 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1681 on 546971 degrees of freedom
Multiple R-squared:  0.9703,    Adjusted R-squared:  0.9703 
F-statistic: 4.462e+06 on 4 and 546971 DF,  p-value: < 0.00000000000000022

Improving the model

Code
cars$mmr2 <- cars$mmr^2

Create final model

Code
car_model2 <- lm(sellingprice ~ year + condition + odometer + mmr + mmr2, data = cars)


summary(car_model2)

Call:
lm(formula = sellingprice ~ year + condition + odometer + mmr + 
    mmr2, data = cars)

Residuals:
   Min     1Q Median     3Q    Max 
-86399   -671     23    760 207161 

Coefficients:
                      Estimate         Std. Error  t value             Pr(>|t|)
(Intercept) 93542.719690983547  1997.976928954122   46.819 < 0.0000000000000002
year          -47.040321224856     0.994025757383  -47.323 < 0.0000000000000002
condition      37.320026013256     0.182210514449  204.818 < 0.0000000000000002
odometer       -0.001134479845     0.000071037618  -15.970 < 0.0000000000000002
mmr             0.985989030856     0.000606209687 1626.482 < 0.0000000000000002
mmr2           -0.000000035068     0.000000008956   -3.915            0.0000902
               
(Intercept) ***
year        ***
condition   ***
odometer    ***
mmr         ***
mmr2        ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1681 on 546970 degrees of freedom
Multiple R-squared:  0.9703,    Adjusted R-squared:  0.9703 
F-statistic: 3.569e+06 on 5 and 546970 DF,  p-value: < 0.00000000000000022

Making predcitons with the regression model

We can see that the model is pretty accurate in determining car prices

Code
cars$pred <- predict(car_model, cars)
cor(cars$pred, cars$sellingprice)
[1] 0.9850188
Code
plot(cars$pred, cars$sellingprice)
abline(a = 0, b = 1, col = "red", lwd = 3, lty = 2)

Ploting the Predicted selling car prices against the actual selling price.

An analysis on SUV prices

Code
suv_brands <- cars |>
  filter(make == "Jeep" | make == "Ford" | make == "Land Rover" | make == "Kia" | make == "Cadillac") |>
  filter(model == "Wrangler"| model == "Range Rover"| model == "Explorer"| model == "Sorento"| model == "Escalade")
head(suv_brands) %>% gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate mmr2 pred
2015 Kia Sorento LX SUV automatic 5xyktca69fg566472 ca 5 16639 white black kia motors america inc 20500 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 420250000 19119.32
2015 Kia Sorento LX SUV automatic 5xyktca69fg561319 ca 5 9393 white beige kia motors america inc 20800 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 432640000 19423.04
2015 Kia Sorento LX SUV automatic 5xyktca66fg561407 ca 5 14634 silver black kia motors america inc 20600 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 424360000 19220.07
2015 Kia Sorento LX SUV automatic 5xyktca60fg565226 ca 5 13757 red black kia motors america inc 20600 20750 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 424360000 19221.11
2015 Kia Sorento LX SUV automatic 5xyktca68fg559481 ca 44 12862 gray black kia motors america inc 20700 21000 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 428490000 20777.89
2015 Kia Sorento LX SUV automatic 5xyktca67fg570973 ca 5 13878 silver black kia motors america inc 20600 20750 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 424360000 19220.96

a filtered dataset for all the suv brands

Seperating the different brands

Code
jeep_p <- cars |>
  filter(make == "Jeep") |>
  filter(model == "Wrangler")
  
  
land_p <- cars |>
  filter(make == "Land Rover") |>
  filter(model == "Range Rover")
  

ford_p <- cars |>
  filter(make == "Ford") |>
  filter(model == "Explorer")
  
  
kia_p <- cars |>
  filter(make == "Kia") |>
  filter(model == "Sorento")
  
cad_p <- cars |>
  filter(make == "Cadillac") |>
  filter(model == "Escalade")

Table of all the values

Code
head(jeep_p) |> gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate mmr2 pred
2013 Jeep Wrangler Unlimited Rubicon SUV manual 1c4bjwfg4dl585036 ca 47 9153 red gray clear view systems 34300 35250 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 1176490000 34367.85
2013 Jeep Wrangler Unlimited Sahara SUV automatic 1c4hjweg8dl539625 ca 31 41173 black black avis corporation 27100 27000 Tue Dec 30 2014 15:00:00 GMT-0800 (PST) 734410000 26647.94
2013 Jeep Wrangler Unlimited Sport SUV automatic 1c4hjwdg9dl604791 ca 43 41997 black black avis corporation 24200 25250 Tue Dec 30 2014 15:00:00 GMT-0800 (PST) 585640000 24241.98
2013 Jeep Wrangler Unlimited Sport SUV automatic 1c4hjwdg3dl620646 ca 37 42482 black black avis corporation 24200 25000 Tue Dec 30 2014 15:00:00 GMT-0800 (PST) 585640000 24017.20
2012 Jeep Wrangler Unlimited Sport SUV automatic 1c4hjwdg9cl227569 ca 38 39702 red black avis corporation 24900 25000 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 620010000 24792.62
2012 Jeep Wrangler Unlimited Sport SUV automatic 1c4bjwdg4cl192882 ca 33 58226 white black the hertz corporation 23100 21800 Thu Dec 18 2014 11:30:00 GMT-0800 (PST) 533610000 22812.87
Code
head(kia_p) |> gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate mmr2 pred
2015 Kia Sorento LX SUV automatic 5xyktca69fg566472 ca 5 16639 white black kia motors america inc 20500 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 420250000 19119.32
2015 Kia Sorento LX SUV automatic 5xyktca69fg561319 ca 5 9393 white beige kia motors america inc 20800 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 432640000 19423.04
2015 Kia Sorento LX SUV automatic 5xyktca66fg561407 ca 5 14634 silver black kia motors america inc 20600 21500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 424360000 19220.07
2015 Kia Sorento LX SUV automatic 5xyktca60fg565226 ca 5 13757 red black kia motors america inc 20600 20750 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 424360000 19221.11
2015 Kia Sorento LX SUV automatic 5xyktca68fg559481 ca 44 12862 gray black kia motors america inc 20700 21000 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 428490000 20777.89
2015 Kia Sorento LX SUV automatic 5xyktca67fg570973 ca 5 13878 silver black kia motors america inc 20600 20750 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 424360000 19220.96
Code
head(land_p) |> gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate mmr2 pred
2012 Land Rover Range Rover HSE SUV automatic salme1d45ca360653 ca 36 56169 black black us bank 41900 39500 Thu Jan 15 2015 04:30:00 GMT-0800 (PST) 1755610000 41425.28
2011 Land Rover Range Rover Supercharged SUV automatic salmf1e49ba336320 ca 39 53634 black black bank of the west 43200 46000 Thu Dec 18 2014 12:00:00 GMT-0800 (PST) 1866240000 42865.50
2011 Land Rover Range Rover HSE SUV automatic salme1d49ba351839 ca 25 31927 black black midway hfc fleet/ars 40400 34600 Thu Jan 08 2015 12:00:00 GMT-0800 (PST) 1632160000 39612.93
2011 Land Rover Range Rover Supercharged SUV automatic salmf1e45ba331132 ca 47 56574 gray black commerce hyundai 43300 42500 Thu Jan 15 2015 04:30:00 GMT-0800 (PST) 1874890000 43259.37
2011 Land Rover Range Rover HSE SUV automatic salmf1d47ba331103 ca 39 47019 black black jpmorgan chase bank n.a. 40100 40500 Thu Dec 18 2014 12:30:00 GMT-0800 (PST) 1608010000 39823.11
2011 Land Rover Range Rover HSE SUV salmf1d42ba346401 ca 41 58109 black gray jpmorgan chase bank n.a. 37000 34500 Thu Jan 15 2015 04:30:00 GMT-0800 (PST) 1369000000 36834.60
Code
head(ford_p) |> gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate mmr2 pred
2012 Ford Explorer Base SUV automatic 1fmhk7b83cga31940 ca 19 44896 black tan galpin ford studio rentals 16550 19250 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 273902500 15860.696
2012 Ford Explorer XLT SUV automatic 1fmhk7d86cga36787 ca 46 34434 gray black ford motor credit company llc pd 22000 24300 Thu Dec 18 2014 12:00:00 GMT-0800 (PST) 484000000 22244.370
2012 Ford Explorer Limited SUV automatic 1fmhk8f89cga08890 ca 29 48130 black black galpin ford studio rentals 25800 25500 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 665640000 25331.913
2012 Ford Explorer Base SUV automatic 1fmhk7b84cga36676 ca 35 37704 blue tan galpin ford studio rentals 17050 18000 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 290702500 16959.019
2011 Ford Explorer XLT SUV automatic 1fmhk8d88bga57418 ca 38 75162 gray black premium auto wholesale 18350 17500 Tue Dec 30 2014 15:00:00 GMT-0800 (PST) 336722500 18352.106
2008 Ford Explorer XLT SUV 1fmeu63e68ua07815 ca 39 100660 black beige hertz remarketing 7475 6500 Thu Dec 18 2014 12:00:00 GMT-0800 (PST) 55875625 7797.242
Code
head(cad_p) |> gt()
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate mmr2 pred
2012 Cadillac Escalade Luxury SUV automatic 1gys3bef3cr133032 ca 36 35551 black black fiserv/usb dealer services northstar exchange 36600 36500 Thu Dec 18 2014 12:30:00 GMT-0800 (PST) 1339560000 36234.75
2012 Cadillac Escalade Premium SUV automatic 1gys3cef9cr278064 ca 38 53781 black black fiserv/usb dealer services northstar exchange 35600 35500 Thu Dec 18 2014 12:30:00 GMT-0800 (PST) 1267360000 35304.07
2012 Cadillac Escalade Luxury SUV automatic 1gys4bef5cr127343 ca 3 54683 black beige tdaf remarketing 35000 35750 Tue Dec 16 2014 12:30:00 GMT-0800 (PST) 1225000000 33404.79
2012 Cadillac Escalade Luxury SUV automatic 1gys3bef6cr152612 ca 41 35958 black black fiserv/usb dealer services northstar exchange 36600 39750 Thu Dec 18 2014 12:30:00 GMT-0800 (PST) 1339560000 36421.11
2011 Cadillac Escalade Premium SUV automatic 1gys4cef6br391997 ca 42 40009 gray tan fiserv/usb dealer services northstar exchange 36800 38250 Thu Dec 18 2014 12:30:00 GMT-0800 (PST) 1354240000 36696.51
2009 Cadillac Escalade Base SUV automatic 1gyfc53219r124437 ca 26 72247 blue black caseys cars inc 22600 25000 Tue Dec 30 2014 12:30:00 GMT-0800 (PST) 510760000 22180.87

Preidcting the car price for a Kia Sorento while changing a few of the features

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16000, color = "white", interior = "black", mmr = 20500, transmission = "automatic"))
       1 
20427.94 

Changing the year

Code
predict(car_model, 
        data.frame(year = 2016, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16000, color = "white", interior = "black", mmr = 20500, transmission ="automatic"))
       1 
20381.92 

Changing the state

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "tx", condition = 40, odometer = 16000, color = "white", interior = "black", mmr = 20500, transmission ="automatic"))
       1 
20427.94 

Changing the odometer

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16500, color = "white", interior = "black", mmr = 20500, transmission ="automatic"))
       1 
20427.35 

Changing the Mannehiam Market Report

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16500, color = "white", interior = "black", mmr = 22500, transmission ="automatic"))
       1 
22395.21 

Changing the interior

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16500, color = "white", interior = "white", mmr = 22500, transmission ="automatic"))
       1 
22395.21 

Changing the color

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16500, color = "blue", interior = "black", mmr = 22500, transmission ="automatic"))
       1 
22395.21 

Changing the transmission

Code
predict(car_model, 
        data.frame(year = 2015, make = "Kia", model = "Sorento", trim = "LX", body = "SUV", state = "ca", condition = 40, odometer = 16500, color = "white", interior = "black", mmr = 22500, transmission ="manual"))
       1 
22395.21 

Some plots to visualize the data

Code
cars |> ggplot(aes(x = condition, y = sellingprice)) + geom_point()

Looking at the relationship between the condition of the car and the selling price. We can see that the better the condition the car is the higher the price it sells for. And we can also see the some cars in really bad condtion can still fetch a high price in the market.

Same as above just spliting by brand

Code
cars |> 
  filter(make == "Jeep"| make == "Ford"| make == "Land Rover"| make == "Cadillac") |>
  ggplot(aes(x = condition, y = sellingprice, color = make)) + geom_point()

Looking deepper into the analysis. By splitting the dataset into SUVs. We can see the difference between Brands and how much they are demanded for at different condtions. For example we can see that regardless of condtion, Land Rover’s are highly demanded.

Same with a facet_wrap

Code
cars_face <- cars |> 
  filter(make == "Jeep"| make == "Ford"| make == "Land Rover"| make == "Cadillac"| make == "Kia") |>
  ggplot(aes(x = condition, y = sellingprice, color = make)) + geom_point() + facet_wrap(~make)
cars_face

Using a facet wrap to really isolate the difference in prices based on the conditions among the brands.

doing the same with individual selling price of the different car models

Code
suv_brands |>
  ggplot(aes(x = condition, y = sellingprice, color = make)) + geom_point() + facet_wrap(~model)

Looking at the different SUV brands and how the prices compare at different condtions.

box plot of all the different suv demand

Code
ggplot(suv_brands, aes(x=make, y=sellingprice)) +
  geom_boxplot()

A boxplot that shows the distribution in prices. We can also see outliers among the prices for the cars.
Code
library(plotly)
library(tidyverse)
p1 <- cars |> 
  filter(make %in% c("jeep", "ford", "land rover", "cadillac", "kia")) |>
  ggplot(aes(x = condition, y = sellingprice)) + geom_point(aes(color = make, size = odometer,   frame = year, ids = model), alpha = 0.5) + labs(
    x = "The Condtion of the car at the time of Sale",
    y = "The Selling Price of the Car",
    color = "Brand of the car",
    size = NULL
    )
Warning in geom_point(aes(color = make, size = odometer, frame = year, ids =
model), : Ignoring unknown aesthetics: frame and ids
Code
ggplotly(p1)
Warning in p$x$data[firstFrame] <- p$x$frames[[1]]$data: number of items to
replace is not a multiple of replacement length

Timeline to see how selling price and condtion have changed over time.

Code
library(plotly)
cars_face |> ggplotly()

The previous facet wrap plot but this time as an interactive plotly graph.